Reading the data

The data for this analysis was created in folder 2019-03-29.

args <- commandArgs(trailingOnly = TRUE)
if (length(args) == 0) {
   stop("At least one argument must be supplied: input file.", call.=FALSE)
} else if (length(args) == 1) {
   args[2] <- "."
}

input_file_name <- args[1]
output_dir <- args[2]
counts <- read.table(input_file_name, row.names=1, header=TRUE)

Three factors: population, diapause or hatching condition, and selective regime. I prefer the immediate hatching (instead of forced diapause) and the regular or periodic selective regime as the reference levels of the corresponding factors.

population <- factor(c("1","1","2","2","3","3","4","4","5","5","6","6"))
NestedPop  <- factor(c("1","1","2","2","1","1","3","3","2","2","3","3"))
diapause   <- factor(rep(c("hatch","diapause"), 6))
diapause   <- relevel(diapause, "hatch")
regime     <- factor(c("random","random","random","random","regular","regular",
                 "random","random","regular","regular","regular","regular"))
regime     <- relevel(regime, "regular")
combined   <- factor(c("RandHatch","RandDiap","RandHatch","RandDiap","RegulHatch","RegulDiap",
                      "RandHatch","RandDiap","RegulHatch","RegulDiap","RegulHatch","RegulDiap"))
combined   <- relevel(combined, "RegulHatch")

y <- DGEList(counts=counts, group=regime)
y <- calcNormFactors(y, method="TMM")

barplot(y$samples$lib.size*1e-6, names=1:12, ylab="Library size (millions)")

It looks like the populations undergoing diapause systematically produced smaller libraries. It may be the effect of a smaller amount of eggs available. Population 5 is the exception.

Filtering

The recommended filtering consists on removing genes that have such a low expression level that do not have at least 5-10 counts on as many samples as groups there are. The rationale is that those genes cannot have enough reads in at least one sample of each group. However, instead of filtering on the raw counts, the edgeR manual suggests to translate whatever minimum number of reads per library in terms of counts per million. The reason is that the filtering should take into account differences in library sizes between samples. It probably makes sense, because otherwise, I would be applying an effectively higher (harsher) minimum threshold to the gene expression level in samples with smaller libraries. That would probably bias results, since we would be enriching the dataset in genes more highly expressed in some samples than in others.

Say that library sizes range between 250000 and 2000000. Then, applying a minimum expression level of 20 counts per million is the only way to make sure that even the small libraries are required to contain at least 5 counts, while at the same time setting the cutoff at the same biologically relevant level. In any case, as long as the minimum expression level is not enforced in all samples, the dataset retained will contain genes that do not reach the minimum in some samples. Since counts per million are (expectedly and actually) similarly distributed among samples, it should not be the case that any sample is enriched in low-expression genes.

threshold <- 5.0 / (min(y$samples$lib.size) / 1000000)
keep <- rowSums(cpm(y) > threshold) >= 4
y <- y[keep, ,keep.lib.sizes=FALSE]

plotMDS(y$counts)

The magnitude of variation along the axes seems very high in comparison with the examples available in the EdgeR manual, suggesting wild gene expression differences. With the exception of sample X5A_S2, dimension 1 separates samples undergoing forced diapause (named with a “C”, mostly on the left half of the plot) from those hatching right away (named with an “A”, mostly on the right). The second dimension is also meaningful, since it almost separates populations 1, 2, and 4, which underwent random regime, from 3, 5 and 6 (regular regime). Sample X2A_S7 is the main exception.

layout(mat=matrix(c(1,2,3,4,5,6,7,8,9,10,11,12), nrow=4, ncol=3, byrow=TRUE), width=c(1,1,1), height=c(1,1,1,1))
for (i in 1:12) {
   plotMD(cpm(y, log=TRUE), column=i)
}

layout(mat=matrix(c(1), ncol=1, nrow=1))

A lot of genes seem to be differentially expressed. The total number of tags passing the filters is 18497. Let’s define a variable to store this value:

NumOfTags <- dim(y)[1]

Estimating the Biological Coefficient of Variation

I need different models:

design1 <- model.matrix(~population + diapause)
design2 <- model.matrix(~regime + diapause)
design3 <- model.matrix(~0 + combined)
design4 <- model.matrix(~regime + diapause + regime:NestedPop)
design5 <- model.matrix(~regime + diapause + regime:diapause)
design6 <- model.matrix(~regime + diapause + regime:diapause + regime:NestedPop)

y1 <- estimateDisp(y, design1)
y2 <- estimateDisp(y, design2)
y3 <- estimateDisp(y, design3)
y4 <- estimateDisp(y, design4)
y5 <- estimateDisp(y, design5)
y6 <- estimateDisp(y, design6)

layout(mat=matrix(c(1,2,3,4,5,6), ncol=3, nrow=2, byrow=TRUE))
plotBCV(y1, main='Model 1')
plotBCV(y2, main='Model 2')
plotBCV(y3, main='Model 3')
plotBCV(y4, main='Model 4')
plotBCV(y5, main='Model 5')
plotBCV(y6, main='Model 6')

Fitting the models

fit1 <- glmQLFit(y1, design1)
fit2 <- glmQLFit(y2, design2)
fit3 <- glmQLFit(y3, design3)
fit4 <- glmQLFit(y4, design4)
fit5 <- glmQLFit(y5, design5)
fit6 <- glmQLFit(y6, design6)

layout(mat=matrix(c(1,2,3,4,5,6), ncol=3, nrow=2, byrow=TRUE))
plotQLDisp(fit1, main='Model 1')
plotQLDisp(fit2, main='Model 2')
plotQLDisp(fit3, main='Model 3')
plotQLDisp(fit4, main='Model 4')
plotQLDisp(fit5, main='Model 5')
plotQLDisp(fit6, main='Model 6')

layout(mat=matrix(c(1,2,3,4,5,6), ncol=3, nrow=2, byrow=TRUE))
gof(fit1, plot=TRUE, main='Model 1')
gof(fit2, plot=TRUE, main='Model 2')
gof(fit3, plot=TRUE, main='Model 3')
gof(fit4, plot=TRUE, main='Model 4')
gof(fit5, plot=TRUE, main='Model 5')
gof(fit6, plot=TRUE, main='Model 6')

Genes affected by hatching condition

Model 1

Model 1 is expected to be the most sensitive to detect genes the expression of which is affected by enforcing diapause, after accounting for the variation among populations (which includes variation due to selective regime).

qlf1 <- glmQLFTest(fit1)
topTags(qlf1)
## Coefficient:  diapausediapause 
##                 logFC      logCPM        F       PValue       FDR
## XLOC_052427 -2.862577 -0.03880226 74.25517 1.248169e-05 0.2308739
## XLOC_045087 -2.092068 -0.62193946 61.97451 2.574843e-05 0.2381344
## XLOC_046298 -2.137567  2.03533181 47.20612 7.444936e-05 0.4079983
## XLOC_044305 -1.793393  0.74051195 42.36960 1.122658e-04 0.4079983
## XLOC_039886 -1.073438  3.55924117 39.75017 1.426172e-04 0.4079983
## XLOC_031542 -1.315396  3.43871982 37.38349 1.791055e-04 0.4079983
## XLOC_039932 -2.229434  0.57344448 37.22741 1.818926e-04 0.4079983
## XLOC_008112 -1.069522  3.44918221 36.21625 2.012926e-04 0.4079983
## XLOC_005549 -1.399274  0.90839738 35.58890 2.146081e-04 0.4079983
## XLOC_027320 -2.327739 -1.25310508 34.57151 2.385751e-04 0.4079983
z1 <- topTags(qlf1, n=NumOfTags)$table
z1$logPValue <- -log10(z1$PValue)
ggplot(data=z1, mapping=aes(x=logFC, y=logPValue, color=FDR)) +
   geom_point() + theme_grey() + ylab(expression(paste("-log"[10], plain("(p value)")))) +
   ggtitle('Effect of enforcing diapause, model 1')

I would say the two first genes (XLOC_052427 and XLOC_045087) are significantly downregulated when enforcing diapause, with respect to the immediate hatching condition.

Model 2

Model 2 is very simple, and may include false positives, because it does not account for any interaction, nor for among-population variance.

qlf2 <- glmQLFTest(fit2, coef=3)
topTags(qlf2)
## Coefficient:  diapausediapause 
##                  logFC      logCPM        F       PValue       FDR
## XLOC_046298 -2.1934422  2.03326110 51.93989 7.891698e-06 0.0570513
## XLOC_045087 -2.0997097 -0.62715097 51.05617 8.624217e-06 0.0570513
## XLOC_052427 -2.8514007 -0.04558558 50.36423 9.253063e-06 0.0570513
## XLOC_039932 -2.2421986  0.56806964 39.82751 3.016192e-05 0.1394763
## XLOC_005549 -1.4557679  0.90508954 33.88444 6.573878e-05 0.1864978
## XLOC_011476 -0.9558098  2.80761818 33.59469 6.846023e-05 0.1864978
## XLOC_031542 -1.3570219  3.43794075 33.35405 7.081991e-05 0.1864978
## XLOC_027320 -2.2743372 -1.25974811 32.44203 8.066079e-05 0.1864978
## XLOC_044305 -1.7886843  0.73658501 30.90979 1.009973e-04 0.2039456
## XLOC_039925 -2.2624738 -0.39613265 30.32658 1.102588e-04 0.2039456
z1 <- topTags(qlf2, n=NumOfTags)$table
z1$logPValue <- -log10(z1$PValue)
ggplot(data=z1, mapping=aes(x=logFC, y=logPValue, color=FDR)) +
   geom_point() + theme_grey() + ylab(expression(paste("-log"[10], plain("(p value)")))) +
   ggtitle('Effect of enforcing diapause, model 2')

According to model 2, there are 3 tags significantly up- or downregulated by hatching condition, at FDR=0.1.

Model 3

In model 3, the effect of diapause may be understood in two different ways. First, we may want to know what genes have an average expression level different between hatching conditions. This is different from testing the effect of hatching condition twice: first among samples in a regular environment and then among those in the stochastic environment. Then, I could select the genes with the same significant tendency in both selective regimes. I will limit this analysis to the first option, comparing only average gene expression level.

qlf3 <- glmQLFTest(fit3, contrast=c(-0.5, 0.5, -0.5, 0.5))
topTags(qlf3)
## Coefficient:  -0.5*combinedRegulHatch 0.5*combinedRandDiap -0.5*combinedRandHatch 0.5*combinedRegulDiap 
##                 logFC      logCPM        F       PValue       FDR
## XLOC_039932 -2.416741  0.56620035 51.59509 1.359728e-05 0.1183499
## XLOC_045087 -2.165847 -0.62906850 49.75699 1.617579e-05 0.1183499
## XLOC_046298 -2.190837  2.03257170 46.84209 2.154356e-05 0.1183499
## XLOC_052427 -2.877561 -0.04800235 45.15838 2.559331e-05 0.1183499
## XLOC_005549 -1.471140  0.90395245 31.56960 1.291712e-04 0.4091716
## XLOC_031542 -1.358202  3.43768730 30.44766 1.510495e-04 0.4091716
## XLOC_011476 -0.956062  2.80722007 30.27242 1.548468e-04 0.4091716
## XLOC_039925 -2.376220 -0.39740095 29.11401 1.829706e-04 0.4199041
## XLOC_044305 -1.793839  0.73522984 27.89970 2.191245e-04 0.4199041
## XLOC_027320 -2.212433 -1.26225398 27.66599 2.270120e-04 0.4199041
z1 <- topTags(qlf3, n=NumOfTags)$table
z1$logPValue <- -log10(z1$PValue)
ggplot(data=z1, mapping=aes(x=logFC, y=logPValue, color=FDR)) +
   geom_point() + theme_grey() + ylab(expression(paste("-log"[10], plain("(p value)")))) +
   ggtitle('Effect of enforcing diapause, model 3')

According to model 3, there are 0 tags significantly up- or downregulated by hatching condition at FDR = 0.1.

Model 4

qlf4 <- glmQLFTest(fit4, coef=3)
topTags(qlf4)
## Coefficient:  diapausediapause 
##                 logFC      logCPM        F       PValue       FDR
## XLOC_052427 -2.862577 -0.03880226 74.25517 1.248169e-05 0.2308739
## XLOC_045087 -2.092068 -0.62193946 61.97451 2.574843e-05 0.2381344
## XLOC_046298 -2.137567  2.03533181 47.20612 7.444936e-05 0.4079984
## XLOC_044305 -1.793393  0.74051195 42.36960 1.122658e-04 0.4079984
## XLOC_039886 -1.073438  3.55924117 39.75017 1.426172e-04 0.4079984
## XLOC_031542 -1.315396  3.43871982 37.38349 1.791055e-04 0.4079984
## XLOC_039932 -2.229434  0.57344448 37.22741 1.818926e-04 0.4079984
## XLOC_008112 -1.069522  3.44918221 36.21625 2.012926e-04 0.4079984
## XLOC_005549 -1.399274  0.90839738 35.58890 2.146081e-04 0.4079984
## XLOC_027320 -2.327739 -1.25310508 34.57151 2.385751e-04 0.4079984
z1 <- topTags(qlf4, n=NumOfTags)$table
z1$logPValue <- -log10(z1$PValue)
ggplot(data=z1, mapping=aes(x=logFC, y=logPValue, color=FDR)) +
   geom_point() + theme_grey() + ylab(expression(paste("-log"[10], plain("(p value)")))) +
   ggtitle('Effect of enforcing diapause, model 4')

According to model 4, there are 0 tags significantly up- or downregulated by hatching condition at FDR = 0.1.

Model 5

qlf5 <- glmQLFTest(fit5, coef=3)
topTags(qlf5)
## Coefficient:  diapausediapause 
##                  logFC      logCPM        F       PValue       FDR
## XLOC_046298 -2.2208975  2.03257170 25.66097 0.0003103538 0.8615771
## XLOC_052427 -2.7733434 -0.04800235 25.29651 0.0003291244 0.8615771
## XLOC_045087 -1.9059084 -0.62906850 24.01320 0.0004067258 0.8615771
## XLOC_027320 -2.4408746 -1.26225398 20.86498 0.0007086469 0.8615771
## XLOC_039932 -1.7647818  0.56620035 17.70824 0.0013133786 0.8615771
## XLOC_039886 -1.1543798  3.55817365 16.00059 0.0018901224 0.8615771
## XLOC_011476 -0.9436807  2.80722007 15.14173 0.0022910345 0.8615771
## XLOC_039925 -2.0596861 -0.39740095 14.97445 0.0023803449 0.8615771
## XLOC_005549 -1.3491437  0.90395245 14.83294 0.0024591126 0.8615771
## XLOC_051241  0.7232360  4.03455535 14.53545 0.0026349465 0.8615771
z1 <- topTags(qlf5, n=NumOfTags)$table
z1$logPValue <- -log10(z1$PValue)
ggplot(data=z1, mapping=aes(x=logFC, y=logPValue, color=FDR)) +
   geom_point() + theme_grey() + ylab(expression(paste("-log"[10], plain("(p value)")))) +
   ggtitle('Effect of enforcing diapause, model 5')

According to model 5, there are 0 tags significantly up- or downregulated by hatching condition at FDR = 0.1.

Model 6

qlf6 <- glmQLFTest(fit6, coef=3)
topTags(qlf6)
## Coefficient:  diapausediapause 
##                 logFC      logCPM        F       PValue       FDR
## XLOC_052427 -2.782851 -0.04348303 36.74881 0.0003507945 0.8762488
## XLOC_045087 -1.904357 -0.62550917 30.14265 0.0006612656 0.8762488
## XLOC_046298 -2.149860  2.03388091 22.12908 0.0016986145 0.8762488
## XLOC_050455  1.911918 -1.73035181 21.72935 0.0017924047 0.8762488
## XLOC_027320 -2.492942 -1.25762929 21.65528 0.0018104944 0.8762488
## XLOC_018887 -1.023882  2.41333434 20.97668 0.0019874586 0.8762488
## XLOC_039886 -1.154353  3.55867346 20.66676 0.0020755324 0.8762488
## XLOC_044305 -1.749829  0.73778255 19.20150 0.0025655416 0.8762488
## XLOC_017314 -1.507200  1.67498895 17.65544 0.0032523479 0.8762488
## XLOC_053013 -1.890138 -1.60856602 17.64418 0.0032581490 0.8762488
z1 <- topTags(qlf6, n=NumOfTags)$table
z1$logPValue <- -log10(z1$PValue)
ggplot(data=z1, mapping=aes(x=logFC, y=logPValue, color=FDR)) +
   geom_point() + theme_grey() + ylab(expression(paste("-log"[10], plain("(p value)")))) +
   ggtitle('Effect of enforcing diapause, model 6')

According to model 6, there are 0 tags significantly up- or downregulated by hatching condition at FDR = 0.10. And this is how the results from models 2, 4, 5 and 6 match:

significant <- cbind(model2 = topTags(qlf2, n=NumOfTags)$table$FDR <= 0.1, 
                     model3 = topTags(qlf3, n=NumOfTags)$table$FDR <= 0.1,
                     model4 = topTags(qlf4, n=NumOfTags)$table$FDR <= 0.1,
                     model5 = topTags(qlf5, n=NumOfTags)$table$FDR <= 0.1,
                     model6 = topTags(qlf6, n=NumOfTags)$table$FDR <= 0.1)
vennDiagram(vennCounts(significant))

Interaction between selective regime and hatching condition

Model 6

qlf6 <- glmQLFTest(fit6, coef=4)
topTags(qlf6)
## Coefficient:  regimerandom:diapausediapause 
##                  logFC     logCPM        F      PValue       FDR
## XLOC_024210 -2.4456751 -1.5271896 15.55546 0.004605562 0.9998194
## XLOC_029615  0.9438359  5.1117353 15.31782 0.004800492 0.9998194
## XLOC_034421 -1.6537942 -0.7614238 13.26311 0.007010684 0.9998194
## XLOC_029614  0.7651341  4.3502569 12.56990 0.008038787 0.9998194
## XLOC_025062 -1.7375444 -0.7541641 12.40875 0.008304635 0.9998194
## XLOC_020818  0.6104000  6.1257989 12.18356 0.008695084 0.9998194
## XLOC_045709  0.8500404  7.3552494 12.16057 0.008736232 0.9998194
## XLOC_049254 -3.1327600 -1.7547226 11.83037 0.009355502 0.9998194
## XLOC_009802 -1.5751722 -0.7860737 11.34519 0.010369981 0.9998194
## XLOC_040002 -1.5088722 -0.8357054 10.77032 0.011759650 0.9998194
z1 <- topTags(qlf6, n=NumOfTags)$table
z1$logPValue <- -log10(z1$PValue)
ggplot(data=z1, mapping=aes(x=logFC, y=logPValue, color=FDR)) + 
   geom_point() + theme_grey() + ylab(expression(paste("-log"[10], plain("(p value)")))) +
   ggtitle('Interaction regime-hatching')

According to model 6, there are 0 tags with a significant interaction term between selective regime and hatching condition at FDR = 0.25.

Model 5

qlf5 <- glmQLFTest(fit5, coef=4)
topTags(qlf5)
## Coefficient:  regimerandom:diapausediapause 
##                  logFC     logCPM         F      PValue       FDR
## XLOC_024210 -2.4315157 -1.5288213 14.668733 0.002554398 0.9999704
## XLOC_009802 -1.5876502 -0.7871982 13.976584 0.003007000 0.9999704
## XLOC_040002 -1.5020404 -0.8356154 12.813476 0.003999777 0.9999704
## XLOC_020167 -2.9274569 -1.6229634 12.629838 0.004189766 0.9999704
## XLOC_034421 -1.6632933 -0.7631284 11.254892 0.006007371 0.9999704
## XLOC_044803  2.2898771  1.3809934 10.998002 0.006442977 0.9999704
## XLOC_043930  1.9173729 -1.7783709 10.673413 0.007047922 0.9999704
## XLOC_051217 -2.7884765 -1.3106881  9.212432 0.010760719 0.9999704
## XLOC_051241 -0.8044339  4.0345554  9.019197 0.011409157 0.9999704
## XLOC_030765  2.5647431 -1.8677254  8.974542 0.011565526 0.9999704
z1 <- topTags(qlf5, n=NumOfTags)$table
z1$logPValue <- -log10(z1$PValue)
ggplot(data=z1, mapping=aes(x=logFC, y=logPValue, color=FDR)) +
   geom_point() + theme_grey() + ylab(expression(paste("-log"[10], plain("p value)")))) + 
   ggtitle('Interaction regime-hatching in model 5')

According to model 5, there are 0 tags with a significant interaction term between selective regime and hatching condition at FDR = 0.25. And this is how models 5 and 6 agree on what genes have such significant interaction term:

significant <- cbind(model5 = topTags(qlf5, n=NumOfTags)$table$FDR <= 0.25,
                     model6 = topTags(qlf6, n=NumOfTags)$table$FDR <= 0.25)
vennDiagram(vennCounts(significant))

Interaction between selective regime and population

When testing for significant coefficients showing the effect of the nested population, there are not one but 4 possible coefficients, each with a different fold change. For the volcano plot, I want to use for each gene the fold change that is most extreme (either positive or negative).

Model 6

qlf6  <- glmQLFTest(fit6, coef=5:8)
topTags(qlf6)
## Coefficient:  regimeregular:NestedPop2 regimerandom:NestedPop2 regimeregular:NestedPop3 regimerandom:NestedPop3 
##             logFC.regimeregular.NestedPop2 logFC.regimerandom.NestedPop2
## XLOC_017180                     -0.5333835                   -2.37175684
## XLOC_001420                      0.6579139                    0.20077271
## XLOC_020688                     -2.7007688                    3.00443751
## XLOC_018218                     -0.6086934                   -3.72876583
## XLOC_044910                      1.1213577                   -6.37988314
## XLOC_020921                     -1.0612256                    2.40337952
## XLOC_031557                      0.3136556                   -0.07845153
## XLOC_044284                      0.2139017                   -3.84114250
## XLOC_016593                     -0.1878020                    2.32269680
## XLOC_031928                     -2.6973907                   -3.35114756
##             logFC.regimeregular.NestedPop3 logFC.regimerandom.NestedPop3
## XLOC_017180                    -1.45434082                     6.6626442
## XLOC_001420                     0.22957285                     2.2849336
## XLOC_020688                    -0.93160432                    -0.1785064
## XLOC_018218                     0.03406936                    -1.9803567
## XLOC_044910                     0.79545835                     0.3778997
## XLOC_020921                    -0.41787152                     2.2150580
## XLOC_031557                     0.29517354                    -3.7733669
## XLOC_044284                     0.18027177                    -0.4072234
## XLOC_016593                     0.42114057                     1.1601085
## XLOC_031928                    -1.27354445                     0.8327480
##               logCPM         F       PValue         FDR
## XLOC_017180 2.028242 114.34713 6.625782e-07 0.008774226
## XLOC_001420 6.601577  93.39464 1.420066e-06 0.008774226
## XLOC_020688 4.257620  93.34199 1.423078e-06 0.008774226
## XLOC_018218 4.493171  78.97010 2.663740e-06 0.012317802
## XLOC_044910 2.180257  66.71030 4.998997e-06 0.015650489
## XLOC_020921 4.827533  65.96669 5.211885e-06 0.015650489
## XLOC_031557 4.128848  63.73638 5.922767e-06 0.015650489
## XLOC_044284 5.617176  56.52216 9.245204e-06 0.020959552
## XLOC_016593 4.501459  55.04180 1.019819e-05 0.020959552
## XLOC_031928 1.353020  50.94923 1.356103e-05 0.025083830
z1 <- topTags(qlf6, n=NumOfTags)$table
z1$logPValue <- -log10(z1$PValue)
z1$logFC <- z1[cbind(1:NumOfTags, apply(abs(z1[,1:4]), 1, which.max))]
ggplot(data=z1, mapping=aes(x=logFC, y=logPValue, color=FDR)) +
   geom_point() + theme_grey() + ylab(expression(paste("-log"[10], plain("p value)")))) +
   ggtitle('Interaction regime-population in complete model')

According to model 6, there are 3 tags with a significant interaction term between selective regime and population, at FDR = 0.01.

Model 4

qlf4 <- glmQLFTest(fit4, coef=4:7)
topTags(qlf4)
## Coefficient:  regimeregular:NestedPop2 regimerandom:NestedPop2 regimeregular:NestedPop3 regimerandom:NestedPop3 
##             logFC.regimeregular.NestedPop2 logFC.regimerandom.NestedPop2
## XLOC_017180                     -0.5392175                   -2.37153382
## XLOC_001420                      0.6577676                    0.20065937
## XLOC_020688                     -2.6977056                    3.00636838
## XLOC_018218                     -0.6082729                   -3.72626438
## XLOC_044910                      1.1190949                   -6.38536151
## XLOC_020921                     -1.0601584                    2.40246321
## XLOC_031557                      0.3167148                   -0.09121439
## XLOC_031928                     -2.6962251                   -3.34303652
## XLOC_044284                      0.2063445                   -3.81403443
## XLOC_016593                     -0.1880893                    2.32665351
##             logFC.regimeregular.NestedPop3 logFC.regimerandom.NestedPop3
## XLOC_017180                    -1.44066224                     6.6580415
## XLOC_001420                     0.22931775                     2.2848926
## XLOC_020688                    -0.92274643                    -0.1803714
## XLOC_018218                     0.03826964                    -1.9735386
## XLOC_044910                     0.79565615                     0.3818658
## XLOC_020921                    -0.41829498                     2.2144309
## XLOC_031557                     0.28812880                    -3.7777783
## XLOC_031928                    -1.27458159                     0.8322147
## XLOC_044284                     0.18373725                    -0.3947533
## XLOC_016593                     0.42231133                     1.1636143
##               logCPM         F       PValue          FDR
## XLOC_017180 2.027386 128.12243 6.655819e-08 0.0009082409
## XLOC_001420 6.601563 111.72571 1.213292e-07 0.0009082409
## XLOC_020688 4.257758 106.88030 1.473062e-07 0.0009082409
## XLOC_018218 4.493302  89.17333 3.246646e-07 0.0015013304
## XLOC_044910 2.179977  77.68291 5.910599e-07 0.0018364137
## XLOC_020921 4.827610  77.54315 5.956902e-07 0.0018364137
## XLOC_031557 4.128825  65.99132 1.196031e-06 0.0031604273
## XLOC_031928 1.352883  59.27867 1.896798e-06 0.0043856351
## XLOC_044284 5.617183  57.54753 2.153752e-06 0.0044264381
## XLOC_016593 4.501495  53.22555 3.007335e-06 0.0055626671
z1 <- topTags(qlf4, n=NumOfTags)$table
z1$logPValue <- -log10(z1$PValue)
z1$logFC <- z1[cbind(1:NumOfTags, apply(abs(z1[,1:4]), 1, which.max))]
ggplot(data=z1, mapping=aes(x=logFC, y=logPValue, color=FDR)) +
   geom_point() + theme_grey() + ylab(expression(paste("-log"[10], plain("p value)")))) + 
   ggtitle('Interaction regime-population in model 4')

According to model 4, there are 32 tags with a significant interaction term between selective regimew and population, at FDR = 0.01. And this is how the genes identified as having such significant coefficients match between results of models 4 and 6:

significant <- cbind(model4 = topTags(qlf4, n=NumOfTags)$table$FDR <= 0.01,
                     model6 = topTags(qlf6, n=NumOfTags)$table$FDR <= 0.01)
vennDiagram(vennCounts(significant))

Main effect of the selective regime

Model 2

qlf2  <- glmQLFTest(fit2, coef=2)
topTags(qlf2)
## Coefficient:  regimerandom 
##                  logFC     logCPM         F       PValue         FDR
## XLOC_027655  0.9251752  8.9286486 113.13819 1.099287e-07 0.001675758
## XLOC_008190 -2.5310748  2.9391708  96.34454 2.747724e-07 0.001675758
## XLOC_026645 -0.7191677  6.5800094  96.12429 2.783511e-07 0.001675758
## XLOC_028689 -1.8862052  0.5803148  70.32767 1.576704e-06 0.001675758
## XLOC_038903 -2.1291226  1.2166006  68.44789 1.826486e-06 0.001675758
## XLOC_005911  0.9637810  4.3170079  68.29836 1.848251e-06 0.001675758
## XLOC_050984 -0.8177441  6.7958221  67.75819 1.929419e-06 0.001675758
## XLOC_033378 -3.3122199 -0.8898586  66.75531 2.091317e-06 0.001675758
## XLOC_002252 -1.0845811  4.2867859  63.20611 2.805221e-06 0.001675758
## XLOC_032135 -0.8194344  7.5600216  62.73497 2.919794e-06 0.001675758
z1 <- topTags(qlf2, n=NumOfTags)$table
z1$logPValue <- -log10(z1$PValue)
ggplot(data=z1, mapping=aes(x=logFC, y=logPValue, color=FDR)) +
   geom_point() + theme_grey() + ylab(expression(paste("-log"[10], plain("p value)")))) +
   ggtitle('Main effect of selective regime')

According to model 2, there are 6574 tags significantly up- or downregulated by selective regime at FDR=0.01.

Model 3

Using model 3 I can look for genes affected by selective regime in two ways. First, I could look for a significant change in average gene expression among samples in the same regime. Alternatively, I could separately look for genes affected by the selective regime among samples in the same hatching condition, and then compare the lists of significant genes. According to Edu, the latter makes more sense biologically, because we expect the selective regime to affect different sets of genes depending on whether eggs are allowed to hatch or forced to undergo diapause.

Effect of selective regime when eggs are allowed to hatch immediately

qlf3a  <- glmQLFTest(fit3, contrast=c(-1, 0, 1, 0))
topTags(qlf3a)
## Coefficient:  -1*combinedRegulHatch 1*combinedRandHatch 
##                  logFC   logCPM        F       PValue        FDR
## XLOC_023953 -0.8646776 6.612040 68.06870 3.491297e-06 0.03709955
## XLOC_030600 -2.2209199 3.836562 64.08018 4.717492e-06 0.03709955
## XLOC_032568 -1.8724888 7.386211 56.84773 8.504965e-06 0.03709955
## XLOC_024285 -2.7914942 2.520220 55.51519 9.546008e-06 0.03709955
## XLOC_026645 -0.7834294 6.580002 54.95422 1.002853e-05 0.03709955
## XLOC_027655  0.9289364 8.928651 51.29145 1.398795e-05 0.04188624
## XLOC_008190 -2.6533318 2.938903 49.13599 1.717377e-05 0.04188624
## XLOC_032135 -0.9722472 7.560018 48.42231 1.841126e-05 0.04188624
## XLOC_002615 -0.7847700 9.044465 47.39542 2.038040e-05 0.04188624
## XLOC_009961 -0.6746834 6.083798 45.92764 2.364122e-05 0.04305634
z1 <- topTags(qlf3a, n=NumOfTags)$table
z1$logPValue <- -log10(z1$PValue)
ggplot(data=z1, mapping=aes(x=logFC, y=logPValue, color=FDR)) +
   geom_point() + theme_grey() + ylab(expression(paste("-log"[10], plain("p value)")))) +
   ggtitle('Effect of selective regime, when hatching')

According to model 3, contrast a, there are 2985 tags differentially expressed between selective regimes among samples allowed to hatch immediately, at FDR=0.05.

Effect of selective regime when eggs are forced to undergo diapause

qlf3b  <- glmQLFTest(fit3, contrast=c(0, 1, 0, -1))
topTags(qlf3b)
## Coefficient:  1*combinedRandDiap -1*combinedRegulDiap 
##                  logFC      logCPM        F       PValue        FDR
## XLOC_027655  0.9214069  8.92865105 50.44389 1.515015e-05 0.02867084
## XLOC_008157 -2.2051529 -0.61107917 42.44951 3.414937e-05 0.02867084
## XLOC_020167 -3.7990218 -1.62296342 40.44066 4.270654e-05 0.02867084
## XLOC_028689 -2.1187828  0.57932526 40.05087 4.464602e-05 0.02867084
## XLOC_008190 -2.4035954  2.93890295 39.98808 4.496800e-05 0.02867084
## XLOC_033252 -2.8220194 -0.63710157 39.18981 4.930886e-05 0.02867084
## XLOC_026645 -0.6546516  6.58000182 38.37351 5.426551e-05 0.02867084
## XLOC_047029 -2.7230190 -0.01893114 36.37613 6.907619e-05 0.02867084
## XLOC_037384 -3.0460196 -0.56903814 35.10839 8.094748e-05 0.02867084
## XLOC_031654 -3.2234561 -1.56291053 35.00389 8.202877e-05 0.02867084
z1 <- topTags(qlf3b, n=NumOfTags)$table
z1$logPValue <- -log10(z1$PValue)
ggplot(data=z1, mapping=aes(x=logFC, y=logPValue, color=FDR)) +
   geom_point() + theme_grey() + ylab(expression(paste("-log"[10], plain("p value)")))) +
   ggtitle('Effect of selective regime, when hatching')

According to model 3, contrast b, there are 5289 tags differentially expressed between selective regimes among samples undergoing diapause, at FDR=0.05. There is a reasonable correlation between the fold change in the two hatching conditions:

z2 <- merge(topTags(qlf3a, n=NumOfTags)$table,
            topTags(qlf3b, n=NumOfTags)$table,
            by='row.names')
ggplot(data=z2, mapping=aes(x=logFC.x, y=logFC.y, color=FDR.x)) +
   geom_point() + geom_smooth(method='lm') + xlab('logFC hatching') + ylab('logFC diapause')

Average effect of selective regime

qlf3c  <- glmQLFTest(fit3, contrast=c(-0.5, 0.5, 0.5, -0.5))
topTags(qlf3c)
## Coefficient:  -0.5*combinedRegulHatch 0.5*combinedRandDiap 0.5*combinedRandHatch -0.5*combinedRegulDiap 
##                  logFC    logCPM         F       PValue         FDR
## XLOC_027655  0.9251717 8.9286511 101.73353 4.448321e-07 0.003045998
## XLOC_026645 -0.7190405 6.5800018  92.59317 7.266991e-07 0.003045998
## XLOC_008190 -2.5284636 2.9389030  88.90822 8.967784e-07 0.003045998
## XLOC_023953 -0.6841386 6.6120397  85.55357 1.093508e-06 0.003045998
## XLOC_030600 -1.7708987 3.8365623  83.57182 1.233478e-06 0.003045998
## XLOC_024285 -2.1669490 2.5202203  69.60907 3.120830e-06 0.003045998
## XLOC_028689 -1.9034254 0.5793253  69.14219 3.228022e-06 0.003045998
## XLOC_032135 -0.8192658 7.5600175  69.05408 3.248731e-06 0.003045998
## XLOC_009961 -0.5743735 6.0837980  66.56415 3.903928e-06 0.003045998
## XLOC_032568 -1.4070349 7.3862114  65.93200 4.094202e-06 0.003045998
z1 <- topTags(qlf3c, n=NumOfTags)$table
z1$logPValue <- -log10(z1$PValue)
ggplot(data=z1, mapping=aes(x=logFC, y=logPValue, color=FDR)) +
   geom_point() + theme_grey() + ylab(expression(paste("-log"[10], plain("p value)")))) +
   ggtitle('Effect of selective regime, when hatching')

According to model 3, contrast c, there are 5289 tags with a significant change (across selective regimes) of average (across hatching conditions) expression levels, at FDR = 0.05.

Model 4

qlf4  <- glmQLFTest(fit4, coef=2)
topTags(qlf4)
## Coefficient:  regimerandom 
##                 logFC     logCPM         F       PValue         FDR
## XLOC_020921 -2.018982  4.8276100 172.62705 3.683861e-07 0.006814037
## XLOC_043877 -3.193746  6.3759197 121.73876 1.621759e-06 0.014998836
## XLOC_037479 -2.154407  3.2163238  85.89128 6.901589e-06 0.042552895
## XLOC_000348  6.666243 -0.4217109  76.82522 1.087520e-05 0.050289658
## XLOC_027395  3.853572  0.3868021  60.48440 2.835411e-05 0.091696650
## XLOC_020688 -1.673719  4.2577578  56.63358 3.674319e-05 0.091696650
## XLOC_009724  2.876944  0.2635389  53.89944 4.459123e-05 0.091696650
## XLOC_016275  1.869734  3.6656431  53.11570 4.721010e-05 0.091696650
## XLOC_025173 -2.719464  1.9088392  52.94530 4.780421e-05 0.091696650
## XLOC_017180 -2.982087  2.0273862  51.55477 5.301047e-05 0.091696650
z1 <- topTags(qlf4, n=NumOfTags)$table
z1$logPValue <- -log10(z1$PValue)
ggplot(data=z1, mapping=aes(x=logFC, y=logPValue, color=FDR)) +
   geom_point() + theme_grey() + ylab(expression(paste("-log"[10], plain("p value)")))) +
   ggtitle('Main effect of selective regime, model 4')

According to model 4, there are 65 tags significantly up- or downregulated by selective regime at FDR=0.25.

Model 5

qlf5  <- glmQLFTest(fit5, coef=2)
topTags(qlf5)
## Coefficient:  regimerandom 
##                  logFC   logCPM        F       PValue        FDR
## XLOC_023953 -0.8646776 6.612040 68.06870 3.491297e-06 0.03709955
## XLOC_030600 -2.2209199 3.836562 64.08018 4.717492e-06 0.03709955
## XLOC_032568 -1.8724888 7.386211 56.84773 8.504965e-06 0.03709955
## XLOC_024285 -2.7914942 2.520220 55.51519 9.546008e-06 0.03709955
## XLOC_026645 -0.7834294 6.580002 54.95422 1.002853e-05 0.03709955
## XLOC_027655  0.9289364 8.928651 51.29145 1.398795e-05 0.04188624
## XLOC_008190 -2.6533318 2.938903 49.13599 1.717377e-05 0.04188624
## XLOC_032135 -0.9722472 7.560018 48.42231 1.841126e-05 0.04188624
## XLOC_002615 -0.7847700 9.044465 47.39542 2.038040e-05 0.04188624
## XLOC_009961 -0.6746834 6.083798 45.92764 2.364122e-05 0.04305634
z1 <- topTags(qlf5, n=NumOfTags)$table
z1$logPValue <- -log10(z1$PValue)
ggplot(data=z1, mapping=aes(x=logFC, y=logPValue, color=FDR)) +
   geom_point() + theme_grey() + ylab(expression(paste("-log"[10], plain("p value)")))) +
   ggtitle('Main effect of selective regime, model 5')

According to model 5, there are 2985 tags significantly up- or downregulated by selective regime at FDR=0.25.

Model 6

qlf6  <- glmQLFTest(fit6, coef=2)
topTags(qlf6)
## Coefficient:  regimerandom 
##                 logFC     logCPM         F       PValue       FDR
## XLOC_020921 -1.995609  4.8275332 113.03065 7.124887e-06 0.1317890
## XLOC_043877 -3.141690  6.3758644  81.29444 2.328552e-05 0.2153561
## XLOC_037479 -2.148076  3.2157895  55.65069 8.753821e-05 0.4556249
## XLOC_000348  6.749065 -0.4156904  53.29841 1.014816e-04 0.4556249
## XLOC_025173 -2.951951  1.9092244  50.35058 1.231618e-04 0.4556249
## XLOC_043872 -7.580934 -1.2813197  54.66435 1.857266e-04 0.4929989
## XLOC_024723  3.435747  2.0024823  44.04678 1.931546e-04 0.4929989
## XLOC_020688 -1.740333  4.2576198  39.84500 2.692514e-04 0.4929989
## XLOC_027395  3.615407  0.3904945  39.65618 2.734908e-04 0.4929989
## XLOC_021141  3.753503 -0.3644969  39.22427 2.835091e-04 0.4929989
z1 <- topTags(qlf6, n=NumOfTags)$table
z1$logPValue <- -log10(z1$PValue)
ggplot(data=z1, mapping=aes(x=logFC, y=logPValue, color=FDR)) +
   geom_point() + theme_grey() + ylab(expression(paste("-log"[10], plain("p value)")))) +
   ggtitle('Main effect of selective regime, model 6')

According to model 6, there are 2 tags significantly up- or downregulated by selective regime at FDR=0.25. And this is how the different models agree on what genes are differentially expressed in different selective regimes:

significant <- cbind(model2 = topTags(qlf2,  n=NumOfTags)$table$FDR <= 0.01,
                     model3 = topTags(qlf3a, n=NumOfTags)$table$FDR <= 0.05,
                     model4 = topTags(qlf4,  n=NumOfTags)$table$FDR <= 0.25,
                     model5 = topTags(qlf5,  n=NumOfTags)$table$FDR <= 0.05,
                     model6 = topTags(qlf6,  n=NumOfTags)$table$FDR <= 0.25)
vennDiagram(vennCounts(significant))

Note that I am using a different FDR for each model, to obtain more or less comparable and positive numbers.